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Metabolome in progression to Alzheimer's disease 

M Oresic\ T Hydtylainen\ S-K Herukka^, M Sysi-Aho\ I Mattila\ T Seppanan-Laakso\ V Julkunen^, PV Gopalacharyulu\ 
M Hallikainen^, J Koikkalainen^, M Kivipelto'^, S Helisalmi^, J Lotjonen^ and H Soininen^ 

Mild cognitive impairment (UC\) is considered as a transition phase between normal aging and Alzheimer's disease (AD). MCI 
confers an increased risk of developing AD, although the state is heterogeneous with several possible outcomes, including even 
improvement back to normal cognition. We sought to determine the serum metabolomic profiles associated with progression to 
and diagnosis of AD in a prospective study. At the baseline assessment, the subjects enrolled in the study were classified into 
three diagnostic groups: healthy controls (n = 46), MCI (n = 1 43) and AD (n = 47). Among the MCI subjects, 52 progressed to AD 
in the follow-up. Comprehensive metabolomics approach was applied to analyze baseline serum samples and to associate the 
metabolite profiles with the diagnosis at baseline and in the follow-up. At baseline, AD patients were characterized by diminished 
ether phospholipids, phosphatidylcholines, sphingomyelins and sterols. A molecular signature comprising three metabolites 
was identified, which was predictive of progression to AD in the follow-up. The major contributor to the predictive model 
was 2,4-dihydroxybutanoic acid, which was upregulated in AD progressors (P= 0.0048), indicating potential involvement 
of hypoxia in the early AD pathogenesis. This was supported by the pathway analysis of metabolomics data, which 
identified upregulation of pentose phosphate pathway in patients who later progressed to AD. Together, our findings primarily 
implicate hypoxia, oxidative stress, as well as membrane lipid remodeling in progression to AD. Establishment of pathogenic 
relevance of predictive biomarkers such as ours may not only facilitate early diagnosis, but may also help identify new 
therapeutic avenues. 
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Introduction 

Alzheimer's disease (AD) is a growing challenge to the health 
care systems and economies of developed countries, with 
millions of patients suffering from this disease and increasing 
numbers of new cases diagnosed annually with the increasing 
age of populations.'' Mild cognitive impairment (MCI) is 
considered as a transition phase between normal aging and 
AD.^ A subject with MCI shows cognitive impairment, primarily 
in memory functions, yet has preserved activities of daily living 
and does not fulfill the criteria of AD or any other dementia 
disorder. MCI confers an increased risk of developing AD,^ 
although the state is heterogeneous with several possible 
outcomes, including even improvement back to normal 
cognition."* Recent research has thus concentrated on 
obtaining biomarkers to identify features that differentiate 
between those MCI subjects who will develop AD (progressive 
MCI, P-MCI) from stable MCI (S-MCI) and healthy elderly 
control subjects. 

Ideally, the AD biomarkers (1) would reflect the disease- 
related biological processes and (2) may be measured non- 
invasively, such as a blood test. The molecular markers 
sensitive to the underlying pathogenic factors would be of high 
relevance not only to assist early disease detection and 
diagnosis, but also to subsequently facilitate the disease 



monitoring and treatment responses. Promising, although non- 
overlapping, results have been obtained in two independent 
plasma proteomics studies aiming to identify potential markers 
predictive of AD.^'^ Metabolomics is a discipline dedicated to 
the global study of small molecules (i.e., metabolites) in cells, 
tissues and biofluids. Concentration changes of specific groups 
of metabolites may be sensitive to pathogenically relevant 
factors such as genetic variation,^ diet,^ age,^'''° immune 
system status''"' or gut microbiota,''^ and their study may 
therefore be a powerful tool for characterization of complex 
phenotypes affected by both genetic and environmental 
factors.''^ In the past years, technologies have been developed 
that allow comprehensive and quantitative investigation of a 
multitude of different metabolites.""^ 

Among the metabolites, lipids have received most attention, 
as all amyloid precursor protein-processing proteins are 
transmembrane proteins.''^ Lipids are major constituents 
of cell membranes, and their composition is important to 
maintain membrane fluidity, topology, mobility or activity 
of membrane-bound proteins, and to ensure normal cellular 
physiology.''^ Investigations of disease-related 'lipidome' 
covering a global profile of structurally and functionally diverse 
lipids provide an opportunity to pursue, accurately and 
sensitively, studies profiling hundreds of molecular lipids in 
parallel. ^^'^^ The so-called lipidomics approach may not only 
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provide inforination about the disease-related markers, but in 
addition, deliver clues about the mechanisms behind the 
control of cellular lipid homeostasis.^^ 

Herein, we sought to determine the serum metabolomic 
profiles associated with progression to and diagnosis of AD 
in a well-characterized prospective study. At the baseline 
assessment, subjects enrolled in the study were classified into 
three diagnostic groups: healthy controls, MCI and AD. Global 
metabolomics approach using two platforms with broad 
analytical coverage, from lipids to small polar metabolites, 
was applied to analyze baseline serum samples from subjects 
involved in the study, and to associate the metabolite profiles 
with the diagnosis at the baseline and in the follow-up. 



Methods 

Participants. Within the PredictAD project (http://www. 
predictad.eu/), focusing on predictors of conversion of MCI 
to clinical AD dementia, 143 subjects diagnosed with MCI 
were pooled from longitudinal study databases gathered in 
the University of Kuopio, and their findings were compared 
with those of 46 healthy control subjects and 37 AD 
patients. The blood samples were taken during 
morning hours and after fasting in most cases. A venous 
blood sample was collected into heparin tubes and plasma 
was separated using standard methods. The samples were 
aliquoted and stored in polypropylene tubes at -70 °C until 
analyses. Descriptive and clinical data of the study groups 
are presented in Table 1 . 

Healthy control subjects included in this study were 
volunteers from the population-based cohorts, and the 
methods used for the identification of control subjects have 
been described in previous studies. They had no history 
of neurological or psychiatric diseases and showed no 
impairment in the detailed neuropsychological evaluation. 

MCI was diagnosed using the criteria originally proposed by 
the Mayo Clinic Alzheimer's Disease Research Center.^^'^^ 
These criteria have later been modified, but at the time this 
study population was recruited, the MCI criteria required were 
as follows: (1) memory complaint by patient, family or 
physician; (2) normal activities of daily living; (3) normal 
global cognitive function; (4) objective impairment in memory 



or in one other area of cognitive function as evident by scores 
>1.5s.d. below the age-appropriate mean; (5) Clinical 
Dementia Rating (CDR) score of 0.5; and (6) absence 
of dementia. As the subjects were pooled from different 
study databases with slightly different neuropsychological 
test batteries, two scales, which were done with all the MCI 
subjects, were selected to describe their cognitive status, 
mini-mental state examination (MMSE) and CDR sum of 
boxes. Although the neuropsychological test battery used to 
diagnose MCI varied slightly, all the MCI subjects were 
considered having the amnestic subtype of the syndrome at 
the time of recruitment. 

Diagnosis of AD included evaluation of medical history, 
physical and neurological examinations performed by a 
physician, and a detailed neuropsychological evaluation. 
The severity of the cognitive decline was graded according 
to the CDR Scale. Brain magnetic resonance imaging scan, 
cerebrospinal fluid (CSF) analysis, electrocardiography, chest 
radiography, screening for hypertension and depression, and 
blood tests were also performed to exclude other possible 
pathologies underlying the symptoms. The diagnosis of 
dementia was based on the criteria of the Diagnostic and 
Statistical Manual of Mental Disorders, 4th edition^^ and the 
diagnosis of AD on the National Institute of Neurologic and 
Communicative Disorders and Stroke, and Alzheimer's 
Disease and Related Disorders Association criteria.^^ All 
magnetic resonance images were also read by an experi- 
enced neuroradiologist to exclude subjects with severe white 
matter lesions or other abnormalities. The study subjects with 
a history of stroke or transient ischemic attack were excluded 
and accordingly, subjects with extensive confluent white 
matter lesions. 

MCI subjects who developed AD during the course of the 
follow-up were considered as P-MCI subjects (n = 52) and 
those whose status remained stable or improved (i.e., those 
who were later diagnosed as controls) were considered 
having S-MCI (n = 9^). The follow-up time for the P-MCI 
subjects (27 ± 1 8 months. Table 1 ) was set to start at the 
baseline date and considered completed at the time of AD 
diagnosis. In the case of S-MCI subjects, the follow-up time 
(28± 16 months. Table 1) was calculated as the time from 
baseline date to the last available evaluation date. For all 
subjects magnetic resonance images were acquired with 



Table 1 Descriptive statistics of tine study population at baseline 



Control Stable MCI Progressive MCI AD 



A/ =226 


46 


91 


52 


37 


Gender, male/female (%) 


21/25 (46/54) 


32/59 (35/65) 


15/37 (29/71) 


17/20 (46/54) 
75 ±4^ 


Age at baseline, years ( ± s.d.) 


71 ±6 


72 ±5 


71 ±6 


Education, years ( ± s.d.) 


7±2 


7±2 


7±3 


7±3 


MMSE (± s.d.) 


25.8 ±2.2 


24.6 ±3.0^ 


23.7 ±2.7^ 


20.5 ±2.9" 


Follow-up time, months ( ± s.d.) 


31 ±17 


28±16 


27±18 




APOE 82/83/84, % 


0/87/13 


4/74/22 


3/59/38^ 


0/65/35^ 



Abbreviations: AD, Alzheimer's disease; CI, confidence intervals; MCI, mild cognitive impairment. 
^P<0.01 against control, stable MCI and progressive MCI. 
'^P=0.03 against control. 

'^P< 0.001 against control and P=0.03 against stable MCI. 
'^P< 0.001 against control, stable MCI and progressive MCI. 

V-tests P< 0.001 for s4 allele against control with odds ratio 4.0 (CI 2.0-8.3) and P<0.01 against stable MCI with odds ratio 2.2 (CI 1.3-3.7). 
V^-tests P= 0.001 for e4 allele against control with odds ratio 3.5 (CI 1.6-7.6) and P=0.02 against stable MCI with odds ratio 1.9 (CI 1.1-3.5). 
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1 .5T magnetic resonance imaging scan in the Department of 
Clinical Radiology, Kuopio University Hospital.^^The Apolipo- 
protein E (APOE) genotype of the study subjects was deter- 
mined by using a standard protocol. The APOE allelic 
distribution within the study groups is presented in Table 1 . 

Informed written consent was acquired from all the subjects 
according to the Declaration of Helsinki, and the study was 
approved by the Ethics Committee of Kuopio University Hospital. 

Metabolomic analysis. Two analytical platforms for meta- 
bolomics were applied to all samples from the estimation 
cohort: (1) global lipidomics platform, based on ultra perfor- 
mance liquid chromatography coupled to mass spectrometry 
(MS), covers molecular lipids such as phospholipids, 
sphingolipids and neutral lipids; (2) platform for global 
profiling of small polar metabolites, based on compre- 
hensive two-dimensional gas chromatography coupled to 
time-of-flight mass spectrometry (GC x GC-TOFMS), covers 
small molecules such as amino acids, free fatty acids, keto- 
acids, various other organic acids, sterols and sugars. 
Both platforms were recently described in detail^^'^° and 
are also described in Supplementary Methods. Raw ultra 
performance liquid chromatography coupled to MS and 
GC X GC-TOFMS data were processed with MZmine 2^^ 
and guineu^° software, respectively. The final data set 
from each platform consisted of a list of metabolite peaks 
(identified or unidentified) and their levels, calculated using 
the platform-specific methods, across all samples. All meta- 
bolite peaks were included in the data analyses, including the 
unidentified ones. We reasoned that inclusion of complete 
data as obtained from the platform best represents the global 
metabolome, and the unidentified peaks may still be 
followed-up later on with de novo identification, using 
additional experiments if considered of interest. 

Descriptive statistical analyses. Statistical analyses for 
clinical data were performed by SPSS software release 
14.0.1 for Windows (SPSS, Chicago, IL, USA). The 
comparisons between the different study groups were done 
by independent samples Mest. Otherwise, if the assumptions 
for normality were not met, the non-parametric tests were 
used. For the categorical data, the comparisons between 
different groups were made using the x^-tests. 

One-way analysis of variance (ANOVA), implemented in 
Matlab (MathWorks, Natick, MA, USA), was applied to 
compare the average within-cluster metabolite profiles bet- 
ween the diagnostic groups. The statistical analyses at 
individual metabolite level were performed using R version 
2.13. The median values of metabolites across the three 
diagnostic groups at baseline were compared using the 
Kruskal-Wallis one-way ANOVA, whereas the medians of 
P-MCI and S-MCI groups were compared by Wilcoxon test. 
Individual metabolite levels were visualized using the bean- 
plots,^^ implemented in 'beanplot' R package. Beanplot 
provides information on the mean metabolite level within 
each group, density of the data-point distribution, as well as 
shows individual data points. 

Cluster analysis. The data were scaled to zero mean and 
unit variance, to obtain metabolite profiles comparable to 



each other. Bayesian model-based clustering was applied 
on the scaled data to group lipids, which were similarly 
expressed across all samples. The analyses were performed 
using MCLUST^^ method, implemented in R statistical 
language^"^ as package 'mclust'. In MCLUST, the observed 
data are viewed as a mixture of several clusters and each 
cluster comes from a unique probability density function. 
A number of clusters in the mixture, together with the 
cluster-specific parameters that constrain the probability 
distributions, will define a model, which can then be 
compared with others. The clustering process selects the 
optimal model and determines the data partition accordingly. 

A number of clusters ranging from 4-15 and all available 
model families were considered in our study. Models were 
compared using the Bayesian information criterion, which is 
an approximation of the marginal likelihood. The best model is 
the one that gives the largest marginal likelihood of data, that 
is, the highest Bayesian information criterion value. 

Diagnostic model. The best marker combination was 
searched for in two phases: in the first phase, penalized, 
generalized linear models^^ were used to pre-screen a 
prominent marker set, and in the second phase, a step-wise 
optimization algorithm was used to optimize the marker 
combination. In both phases, 1000 cross-validation runs 
were performed. In each run, two out of three and one out of 
three of the samples were selected at random to the training 
and test sets, respectively. In the first phase, markers leading 
to lowest coefficient of variation errors were selected. 

In the second phase, logistic regression model implemen- 
ted in R was applied to discriminate the groups of interest. The 
best marker combination in the logistic regression model was 
selected by step-wise algorithm using Akaike's information 
criterion. The best model was then applied to the test set 
samples to calculate their predicted classes. The optimal 
marker combinations in each of the cross-validation runs, 
receiver-operating characteristic curves with area under the 
curve (AUC) statistics, odds ratios and relative risks were 
recorded. Different biomarker signatures were then compared 
on the basis of the number of times they were selected as the 
best performing models. The performance of the top-ranking 
signature was then reported using the same procedure as 
above, but only considering the selected combination of 
metabolites. Receiver-operating characteristic curves with 
AUC statistics, prediction accuracy, odds ratios and relative 
risks were recorded on the basis of the performance in the 
independently tested data (one out of three of the samples) for 
each of the 2000 cross-validation runs. 

Different models, for example, model based on metabolites 
alone versus model based on APOE genotype, as well as 
metabolites, were compared using the likelihood ratio test, 
which expresses how many times more likely the data are 
under one model than the other to compare their fit with the 
data.^^ 

Pathway analysis. MPEA (metabolic pathway enrichment 
analysis)^^ is a tool for functional analysis and biological 
interpretation of metabolic profiling data generated by GC- 
MS. The concept of MPEA is the same as that of widely- 
accepted gene set enrichment analysis.^^ MPEA accepts a 
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ranked list of mass spectra and tests whether metabolites 
belonging to some metabolic pathway tend to occur toward 
the top (or bottom) of this ranked mass chromatogram. 
Herein, MPEA was applied using the default parameters: 
permutations = 1 00, kselection = 1 , penalty_mode = 0, 
organism = HSA, gsea=1, direction = 2, list_size = all, 
column = VAR5, column_width = 25, dotproduct = 0.05, 
euclideandist = 0.05, hammingdist = 50, jaccarddist = 0.6, 
binarydist = 0.6. 



Results 

Metabolomics in a prospective coliort. Using the two 
analytical platforms, a total of 139 molecular lipids and 
544 small polar metabolites were measured, respectively, 
from 226 serum samples (Table 1). Due to a high degree of 
co-regulation among the metabolites,'^^ one cannot assume 
that all the 683 measured metabolites are independent. 
For this reason the global metabolome was first surveyed by 
clustering the data into a subset of clusters, using the 
Bayesian model-based clustering. Such an approach 
decomposes the metabolome into specific clusters of co- 
varying metabolites. The so-obtained clusters and their 
average levels across different sample groups provide a 
global view of the main metabolic changes. As a potential 
disadvantage, such analysis may miss potentially interesting 
outlier metabolites, which greatly vary between the diag- 
nostic groups, but are not well represented by any of the 
average cluster profiles. 

Lipidomic platform data was decomposed into seven (LCs) 
and the GC x GC — TOFMS-based metabolomic data into six 
clusters (MOs), respectively 

Description of each cluster and the representative metabo- 
lites are shown in Table 2. As expected, the division of clusters 
to a large extent follows different metabolite functional or 
structural groups. As shown in Figure 1a (and in Figure lb for 
selected representative identified metabolites), several of 
the clusters had different average metabolite profiles across 
the three diagnostic groups at the baseline. Specifically, there 



was an overall trend towards lower lipid levels in AD, with the 
highest levels in the control group (LCs 3-7). The differences 
of average within-cluster profiles between the three groups 
reached the significance level in LCI , LG3 (both containing 
predominantly phosphatidylcholines (PC)) and LC4 (consist- 
ing predominantly of ether phospholipids, including plasmalo- 
gens). When corrected for age and APOE genotype, only the 
LC4 remained statistically significant, whereas LCI was 
marginally significant (P=0.07). Among the metabolites, 
MC3 was different between the diagnostic groups at baseline 
at a marginal significance level, but was not significant 
after correction for age and APOE genotype. The two large 
clusters, MCI and MC2, did not change on average between 
the groups, but did contain several significantly changing 
metabolites. 

Feasibility of diagnosis and prediction of AD. To assess 
the feasibility of diagnosis, we performed a model selection in 
multiple-cross validation runs as described in the 'Methods' 
section. The best model derived from logistic regression 
analysis was obtained by combining four metabolites: two PC 
(PC (18:0/18:2) from LCI and PC (16:0/20:4) from LC5), 
lactic acid (MC2; PubChem CID 61503) and ketovaline 
(MC3; PubChem CID 49). This combination was selected in 
248 out of 1 000 cross-validation runs. The next three strongly 
performing models, which were together selected in 275 out 
of 1000 cross-validation runs, were closely related, as they 
contained the subsets of two or three metabolites of the top- 
ranking model. The model performed reasonably well, with 
AUG = 0.77, 90% CI = (0.66, 0.88). Sensitivity and specificity 
on the basis of optimal cut-off point were 0.64, 90% CI = 
(0.40, 0.85) and 0.72, 90% CI = (0.56, 0.86), respectively. 
Supplementary Figure SI shows the receiver-operating 
characteristic curve of the diagnostic model comprising the 
four metabolites, based on the independently tested data 
taken from 2000 samplings. 

We also included age and APOE genotype (APOE s4 
genotype present or absent) in the diagnostic model. APOE or 
age alone performed worse than metabolic signature 
(P< 0.001). For the model based on APOE genotype alone, 



Table 2 Metabolome and lipidome cluster descriptions 


Cluster 


Cluster 


Cluster description 


P baseline 


Examples of metabolites 


name 


size 




diagnosis^ 




LCI 


14 


PCs containing linoleic acid (C18:2n6) 


0.0345 


PC (16:0/18:2), PC (18:0/18:2) 


LC2 


10 


LysoPCs 


0.9365 


LysoPC (16:0), lysoPC (18:0) 


LCS 


31 


Palmitate and stearate containing PCs 


0.0188 


PC (16:0/18:1), PC (16:0/20:3), PC (16:0/16:0), PC (18:0/18:1) 


LC4 


29 


Ether PCs 


0.0135 


PC (0-18:1/16:0), PC (0-18:1/18:2) 


LCS 


6 


AA containing PCs and PEs 


0.1190 


PC (16:0/20:4), PC (18:0/20:4), PE (18:0/20:4) 


LC6 


13 


EPA and DHA containing PCs 


0.2776 


PC (16:0/22:6), PC (18:0/22:6), PC (16:0/20:5) 


LC7 


32 


Sphingomyelins 


0.1106 


SM (d18:1/24:1), SM (d18:1/16:0) 


MCI 


176 


Diverse, including free fatty acids, 


0.5900 


2-ketobutyric acid, citric acid, succinic acid, myristic acid, stearic 






TCA cycle metabolites 




acid, oleic acid, threonic acid 


MC2 


299 


Diverse, including amino acids, sterols 


0.2693 


Cholesterol, sitosterol, campesterol, lactic acid, pyruvic acid. 










glycine 


MC3 


31 


Amino acids, ketoacids 


0.0516 


Ketovaline, glutamine, ornithine 


MC4 


3 


Branched-chain amino acids 


0.5491 


Valine, leucine, isoleucine 


MC5 


32 


Diverse 


0.2169 


Histamine, pyroglutamic acid, glutamic acid 


MC6 


3 


Unknown 


0.1392 





Abbreviations: AA, arachidonic acid; DHA, docosahexanoic acid; EPA, eicosapentanoic acid; lysoPC, lysophosphatidylcholine; PC, phosphatidylcholine. 
^ANOVA across the control, MCI and AD diagnostic groups at baseline. 
P<0.05 marked in bold. 



Translational Psychiatry 



Metabolome in progression to Alzheimer's disease 

M Oresic et al 



0.4 ^ 
0.3 ■ 
0.2 ■ 
0.1 ■ 
0 ■ 
-0.1 ■ 
-0.2 ■ 
-0.3 ■ 
-0.4 ■ 
-0.5 ■ 



MC3 

V'i ^i'' A ''i \ii'^ '' " ' ti i 'V 

LCI LC2 LC3 1 I I I I * 



m Control 



IMC! 



'I 1 'I '1 

LC4 LC5 LC6 LC7 

AD 



500 

^ 200 

o 

E 

100 
50 



50 

40 

^ 30 
E 

20 
10 



o 140 



PC(1 6:0/1 8:2) [LCI] 



Pl(1 8:0/20:4) [LC3] 



P =0.00086 



10 



E 5 



P =0.024 



Control AD 
PC(1 8:0/20:4) [LC5] 



Control AD 
SM(d18:1/24:0) [LC7] 



P =0.01 2 



Control 



AD 




Sitosterol [MC2] 



Control 



Ketovaline [MC3] 




Control AD 



Control 



MC4 MC5 MC6 



PC(0-1 8:0/1 8:2) [LC4] 



10 - 
8 - 




P =0.0086 


6 - 






_i 

3 4- 










E 






2 - 








Control 


AD 


2-ketobutyric acid [MCI] 


ion 

[NO 

o 
o 




P =0.0055 


CO 

^ 50- 

CD 

^ 20- 
o 










o 

0 5- 
2- 

CD 1 
DC T ■ 








Control 


AD 




Histamine [MC5] 


olOOO- 
"5 


+ 


P =0.045 


§ 600- 

c 

o 

o 






Relative ^ 
o 

o o 


i ^ 












Control 


AD 



Figure 1 Metabolomic profiles across the three diagnostic groups at baseline, (a) Mean metabolite levels within each cluster. Error marks show s.e.m. (*P< 0.05). When 
correcting for age and ApoE genotype, only LC4 remained statistically significant, whereas LCI was marginally significant (P=0.07). (b) Profiles of selected representative 
metabolites from different clusters in control and Alzheimer's disease (AD) groups at baseline. The metabolite levels are shown as beanplots,^^ which provide information on 
the mean level (solid line), individual data points (short lines), and the density of the distribution. The concentration scale in beanplots is logarithmic for some metabolites. 



AUG = 0.61, 90% CI = (0.49, 0.73; Supplementary Figure 2). 
Combining metabolic signature and APOE genotype 
did not improve the model (P=0.48) (Supplementary Figure 
3). However, combining age alone, or age and APOE 
genotype together with the metabolic signature did 
improve the model (P= 0.006 and P=0.019, respectively; 
Supplementary Figures 4 and 5). The best performing 
model was based on the metabolite signature together with 
age, with AUC = 0.81, 90% CI = (0.69, 0.91), sensitivity of 
0.67, 90% CI = (0.44, 0.90) and specificity of 0.76, 90% 
CI = (0.60, 0.89). 

We also tested if any of the patients in the progressive MCI 
group had the AD metabolic profile. When applying the AD 
versus control group classification to the P-MCI group, 1 2 MCI 
patients (24%) who later progressed to AD were identified as 
having the AD metabolic signature. 



We then investigated the feasibility of prediction of AD by 
comparing stable and progressive MCI groups on the basis of 
metabolomics profiles at baseline. Using the same approach 
as above, the best model contained three metabolites: PC 
from LC3 (PC (16:0/16:0)), an unidentified carboxylic acid 
(MC2) and 2,4-dihydroxybutanoic acid (MCI; PubChem CID 
192742). The top model was selected in 195 out of 1000 
cross-validation runs. Other best-selected models contained 
the two metabolites (carboxylic acid and 2,4-dihydroxybuta- 
noic acid), but with varying lipids (including lysoPC (1 6:0), PC 
(16:0/20:5), PC (18:0/20:4) or PC (0-18:1/16:0)), or without. 

The metabolic signature obtained predicted AD reasonably 
well, with AUC = 0.77, 90% CI = (0.65, 0.87), sensitivity of 
0.77, 90% CI = (0.53, 1.00), specificity of 0.70, 90% 
CI = (0.53, 0.86) and odds ratio of 8.0, 90% CI = (2.7, 27.6). 
Figure 2 shows the receiver-operating characteristic curve of 
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P-MCI vs. S-MCI at follow-up 
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Figure 2 Feasibility of predicting Alzheimer's disease (AD), based on concentrations of three metabolites (2,4-dihydroxybutanoic acid, unidentified carboxylic acid, 
phosphatidylcholine (PC (16:0/16:0)) in subjects at baseline, who were diagnosed with mild cognitive impairment (MCI), (a) The characteristics of the model were determined 
by independent testing in one out of three of the sample across 2000 cross-validation runs, (b) Beanplots of the three metabolites included in the model, (c) Two-dimensional 
gas chromatography coupled to time-of-flight mass spectrometry (GC x GC-TOFMS) spectra of the two metabolites included in the model, 2,4-dihydroxybutanoic acid and an 
unidentified carboxylic acid. Acc = classification accuracy; AUC = area under the receiver operating characteristic (ROC) curve; OR = odds ratio. 



the combined diagnostic model comprising three metabolites, 
based on the independently tested data taken from 2000 
samplings. Supplementary Figure 6 shows the levels of 
the three metabolites included in the biomarker across all 
four patient groups included in the study. Interestingly, the 
increase of 2,4-dihydroxybutanoic acid concentration appears 
to be specific to the P-MCI group, whereas none of the 
metabolites display the progressive changes from healthy 
controls to AD. 

APOE genotype alone was a poor predictor of progression 
from MCI to AD in comparison with the predictive metabolic 
biomarker (P< 0.001), with AUC = 0.59, 90% CI = (0.47, 
0.70). Addition of APOE genotype to the metabolic signature 
did not significantly improve the predictive model (P=0.15), 
with AUC = 0.75, 90% CI = (0.63, 0.85; Supplementary 
Figure 7). 

Metabolic pathways behind progression to AD. Next, we 
investigated which metabolic pathways may be behind the 
observed metabolic profile changes found to be associated 
with AD and with progression to AD. We applied the pathway 
analysis of GC x GC-TOFMS data using MPEA,^^ aiming to 
identify sets of metabolites belonging to specific metabolic 
pathways, which are significantly different between (1) 
controls and AD groups at baseline (Figure 1 and Supple- 
mentary Figure S1) or (2) S-MCI and P-MCI groups at 
baseline (Figure 2). The results are shown in Table 3. The 



only significantly altered pathway following the P-value 
correction was pentose phosphate pathway when com- 
paring P-MCI and S-MCI groups. Of relevance to this path- 
way, concentration of ribose-5-phosphate was decreased in 
the P-MCI group (P= 0.046), whereas lactic acid (P= 0.040) 
and pyruvic acid (P= 0.058) were increased. 

Discussion 

Our findings, based on a well-phenotyped population, 
associate specific metabolic abnormalities with progression 
to AD. Our non-targeted methods cover a representative part 
of the main metabolic pathways, thus allowing the determina- 
tion of main intermediates of lipid metabolism, energy 
metabolism (tricarboxylic acid cycle, gluconeogenesis, keto- 
genesis) and nitrogen metabolism. 

At the baseline, patients diagnosed with AD had decreased 
concentrations of several lipid classes, including PC, plas- 
malogens, sphingomyelins and sterols. Plasmalogens are 
ether phospholipids, which are enriched in polyunsaturated 
fatty acids, and are abundant in the brain."^'' They have been 
found diminished in AD in multiple previous studies, ^^"^^ as 
well as in normal aging. ^ Also diminishment of sphingomyelins 
and sterols is in line with earlier findings implicating altered 
sterol and sphingomyelin metabolism in AD."*^"^^ Recent 
study suggests that s4 allele of APOE (AP0E4), a major risk 
allele of AD,"*^ is associated with disruption of sterol and 
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Table 3 Pathway analysis of metabolomics data from the GC x GC-TOFMS platform 



KEGG ID Pathway name 



Size 



P-MCI versus S-MCI 



AD versus controls 









N/Na// 


Medium-K 


P 


^corr 


N/Na// 


Medium-K 


P 


^corr 


mapOOOSO 


Pentose phosphate pathway 


28 


2/(32) 


2 


0.000130 


0.09 


15/(434) 


3 


0.000580 


0.46 


map00051 


Fructose and mannose metabolism 


28 


18/(466) 


2 


0.017702 


0.91 


10/(281) 


2 


0.007617 


0.43 


map00052 


Galactose metabolism 


33 


18/(466) 


2 


0.024189 


0.93 


14/(359) 


2 


0.054227 


0.50 


map00061 


Fatty acid biosynthesis 


48 


19/(489) 


3 


0.005718 


0.99 


19/(538) 


2 


0.019644 


0.99 


map00520 


Amino sugar and nucleotide sugar metabolism 


66 


18/(466) 


2 


0.085056 


0.87 


4/(159) 


2 


0.002265 


0.71 


map00710 


Carbon fixation in photosynthetic organisms 


22 


18/(466) 


2 


0.011108 


0.91 


18/(511) 


3 


0.004883 


0.82 


map01040 


Biosynthesis of unsaturated fatty acids 


48 


19/(489) 


3 


0.005718 


0.99 


15/(434) 


2 


0.007750 


0.63 


mapOHOO 


Metabolic pathways 


1059 


7/(120) 


3 


0.661475 


0.25 


15/(434) 


3 


0.986924 


0.91 


mapOmO 


Biosynthesis of secondary metabolites 


472 


5/(81) 


2 


0.253492 


0.15 


15/(434) 


3 


0.585593 


0.60 



'KEGG ID' is the KEGG identifier of the pathway, 'Pathway name' is the name of the pathway given by KEGG and 'Size' is the number of metabolites that belong to 
a particular pathway. 'Medium-K' is the number of metabolites within the data set assigned to the pathway, after pathway inconsistencies has been corrected, and 
'N/Naii' is the rank at which the minimum P-value was obtained using features associated to KEGG (A/) and all features (A/gn), respectively. Pis the P-value given by 
hypergeometric distribution and Pcorr is the corresponding permutation-corrected P-value. 
P-values for Pcorr < 0.1 marked in bold. 



sphingolipid metabolism. ° Given the affected lipids are major 
constituents of lipid membranes, their compositional variation 
with age and in disease is likely affecting the membrane 
fluidity and protein mobility.'^'' '^""'^^ This is particularly relevant 
given recent evidence that truncated amyloid (3 fragments 
may dynamically form ion channels and may so affect the 
uptake of ions such as calcium into the cells. The membrane 
lipid milieu may thus be an important contributing factor 
modulating the dynamics of Ap self-assembly.^^ 

Plasmalogens via the vinyl-ether bond also act as endo- 
genous antioxidants to protect cells from reactive oxygen 
species, and their diminishment in AD is in line with the 
hypothesis implicating the role of oxidative stress in AD 
pathogenesis.^^'^^ In agreement with earlier studies, circulat- 
ing histamine was elevated in patients diagnosed with 
57,58 Histamine stimulates production of nitric oxide^^ 
and thus, the activation of the histaminergic system may also 
contribute to the pathology of AD.^° 

The metabolite biomarker signature was identified, which 
was predictive of progression to AD (Figure 2). The major 
contributing metabolite in the marker panel separating P-MCI 
and S-MCI patients was 2,4-dihydroxybutanoic acid. Interest- 
ingly, this organic acid is a major component of CSF,^"* '^^ but 
is found in plasma at nearly two orders of magnitude lower 
concentrations as in CSF.^^ Very scarce data is available on 
the biochemistry of 2,4-dihydroxybutanoic acid. In one report, 
this metabolite was overproduced under low oxygen condi- 
tions from D-galacturonic acid,^^ a uronic acid, which is a 
stereoisomer of glucoronic acid. Concentration of glucoronic 
acid was decreased at a marginal significance level in the 
P-MCI group in our study (P=0.10). In support of this 
interpretation, there were significant differences in the 
pentose phosphate pathway as shown by pathway analysis, 
including decrease of ribose-5-phosphate and increase of 
lactic acid, an end product of glycolysis. It is known that under 
hypoxic conditions in the brain, more glucose is metabolized 
via the pentose phosphate pathway.^"^ Studies in APP23 
transgenic mice have in fact shown that hypoxia facilitates 
progression to AD.^^ 

The study setting with a prospective cohort of carefully 
characterized and followed-up subjects is a definitive strength 
of the present study. This allowed us to identify the patients 



diagnosed with MCI, who later progressed to AD, and in 
deriving the molecular signature, which can identify such 
patients at baseline. In a health care setting, application of 
such a biochemical assay could therefore complement the 
neurocognitive assessment by the medical doctor and could 
be applied to identify the at-risk patients in need of further 
comprehensive follow-up. As a potential limitation of our 
study, the relatively small sample size did not allow us to split 
our sample into two independent cohorts. As an alternative, 
we performed an implicit validation by performing a model 
selection over a large number of randomly selected subsets of 
samples, then each time, independently validating the model 
in the rest of the sample. The most commonly selected model 
was then selected as our metabolic signature. This approach 
allowed us to estimate and report the distribution of model 
performances and not only of the most optimistic model, 
therefore providing a reasonable estimate of how the 
model may perform in the independent validation setting. 

In conclusion, we have identified metabolic profile changes 
of potential pathogenic relevance in progression to and overt 
AD. Our findings primarily implicate the roles of hypoxia, 
oxidative stress, as well as membrane lipid remodeling in AD. 
Given the key metabolite from the metabolic signature 
predictive of progression to AD is abundant in CSF, further 
investigations should, in addition to its validation in other 
cohort studies, also include metabolomic studies in CSF, as 
well as in experimental models. Establishment of pathogenic 
relevance of predictive biomarkers such as ours may not only 
facilitate early diagnosis, but may also help identify new 
therapeutic avenues. 
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